Biostatistics For Dummies (Monika Wahi John Pezzullo)

© John Wiley & Sons, Inc.

FIGURE 3-1: Distribution of number of private and public airports in 2011 in the population (of 50 states and the District of

Columbia), and four different samples of 20 states from the same population.

As shown in Figure 3-1, when comparing the sample distributions to the distribution of the population

using the histograms, you can see there are differences. Sample 2 looks much more like the population

than Sample 4. However, they are all valid samples in that they were randomly selected from the

population. The samples are an approximation to the true population distribution. In addition, the mean

and standard deviation of the samples are likely close to the mean and standard deviation of the

population, but not equal to it. (For a refresher on mean and standard deviation, see Chapter 9.) These

characteristics of sampling error — where valid samples from the population are almost always

somewhat different than the population — are true of any random sample.

Digging into probability distributions

As described in the preceding section, samples differ from populations because of random

fluctuations. Because these random fluctuations fall into patterns, statisticians can describe

quantitatively how these random fluctuations behave using mathematical equations called probability

distribution functions. Probability distribution functions describe how likely it is that random

fluctuations will exceed any given magnitude. A probability distribution can be represented in several

ways:

As a mathematical equation that calculates the chance that a fluctuation will be of a certain

magnitude. Using calculus, this function can be integrated, which means turned into another related

function that calculates the probability that a fluctuation will be at least as large as a certain

magnitude.

As a graph of the distribution, which looks and works much like a histogram.

As a table of values indicating how likely it is that random fluctuations will exceed a certain

magnitude.

In the following sections, we break down two types of distributions: those that describe fluctuations in

your data, and those that you encounter when performing statistical tests.

Distributions that describe your data